Pruning Processes and a New Characterization of 
Convex Geometries 

Federico Ardila* Elitza Maneva^ 



Abstract 

We provide a new characterization of convex geometries via a multivariate version of an 
identity that was originally proved by Maneva, Mossel and Wainwright for certain combinatorial 
objects arising in the context of the fc-SAT problem. We thus highlight the connection between 
various characterizations of convex geometries and a family of removal processes studied in the 
literature on random structures. 

1 Introduction 

This article studies a general class of procedures in which the elements of a set are removed one 
at a time according to a given rule. We refer to such a procedure as a removal process. If every 
element which is removable at some stage of the process remains removable at any later stage, 
we call this a pruning process. The subsets that one can reach through a pruning process have 
the elegant combinatorial structure of a convex geometry. Our first goal is to highlight the role 
of convex geometries in the literature on random structures, where many pruning processes have 
been studied without exploiting their connection to these objects. Our second contribution is a 
proof that a generalization of a polynomial identity, first obtained for a specific removal process in 
[17j . provides a new characterization of pruning processes and of convex geometries. To prove this 
result we also show how a convex geometry is equivalent to a particular kind of interval partition 
of the Boolean lattice. 

Two equivalent families of combinatorial objects, known as convex geometries and antimatroids, 
were defined in the 1980s [U QT]. The fact that these objects can be characterized via pruning 
processes has been known since then. Some examples of pruning processes considered at that time 
are the removal of vertices of the convex hull of a set of points in M n , the removal of the leaves of 
a tree, and the removal of minimal elements of a poset. More recently various pruning processes 
have been studied in the literature on random structures, and referred to also as peeling, stripping, 
whitening, coarsening, identifying, etc. A typical example is the removal of vertices of degree less 
than k in the process of finding the fc-core of a random (hyper)graph. 

In [17] . a surprising identity was proved to hold for a particular removal process which arises 
in the context of the A;- SAT problem. In this paper, we answer the question posed by Mossel 
[23] of characterizing the combinatorial structures that satisfy (the multivariate version of) that 
identity: they are precisely the convex geometries or equivalently the pruning processes. That is 



*Dept. of Mathematics, San Francisco State University, San Francisco, CA, USA. (federico@math.sfsu.edu) 
^IBM Almaden Research Center, San Jose, CA, USA. (enmaneva@us.ibm.com). 



1 



the content of our main result, Theorem 13 . 1 1 and Corollary 13.21 It says that any pruning process has 
the following two properties, and that in fact either of these two properties characterizes pruning 
processes among removal processes. 



• Suppose there is a subset S of elements that we do not wish to remove. Then there is a 
unique minimal set t(S) achievable by the pruning process which contains S. 

• Suppose to each element e corresponds a weight p e . For every set S reachable by the 
pruning process, whose set of removable elements is U C S, define the weight of S to be 



Equivalently, after appropriate rewording, either of these properties characterize convex geome- 
tries among set systems. 

Outline. In Section [2] we define precisely the equivalent concepts of convex geometry, antimatroid, 
and pruning process. We describe a few different ways of looking at these objects: as set systems, 
as languages generated by a set of circuits or pruning rules, and as lattices. These different repre- 
sentations bring forward the differences between various pruning processes that have been studied. 
Section [3] is devoted to our new characterization of convex geometries. Finally, in Section [4] we ap- 
ply our results to the /c-SAT problem and the distribution considered in [IT], thereby generalizing 
Theorem 6 of |17j . 

Related work. Antimatroids and convex geometries were first identified in the context of lattice 
theory by Dilworth [7J. Since then they have appeared in a variety of combinatorial situations. 
Two particularly important treatments are Edelman and Jamison's convexity approach [8], and 
Korte and Lovasz's greedoid approach [TTJ. Two good introductions to the subject are [3] and |12j . 

Two notable examples of pruning processes on random structures appear in the analysis of 
identifiable vertices in random hypergraphs [6j, and of fc-cores in random hypergraphs |22[ 125] . 
Additionally, in the analysis of satisfiability problems such processes have appeared repeatedly - 
most notably for the pure-literal rule algorithm for fc-SAT [51 1211 [22l 126] . and the study of clustering 
of solutions for XOR-SAT [20] and A;-SAT [HQS]. 

Pruning processes appear also in practical applications; for example in error correcting codes 
such as LDPC codes [9] [16] and LT codes |14j over the erasure channel. A unified analysis of the 
pruning processes in error-correcting codes and the pure-literal rule is provided in [15] . 

Our work and in particular the implication 1 — > 3 of our main Theorem 13.11 is related to 
previous work of Aivaliotis, Gordon, and Graveman [2] and Gordon [10] . For more information on 
this connection, see Section 13.11 

A word on terminology. The objects of this paper have been studied under several different 
names. In particular, other authors have referred to pruning processes as shelling processes. We 
prefer to avoid this name, which may lead to confusion with the other, better established notion 
of shelling in combinatorics. The term "pruning" is more accurate, since a pruning process is 
equivalent to the successive removal of outermost elements of a convex geometry. A good example 
to keep in mind throughout the paper is the process of pruning of a tree by successively removing 
its leaves. 




Then the sum of the weights of all reachable sets is 1. 
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2 Convex geometries and antimatroids 



Convex geometries and antimatroids are equivalent families of combinatorial objects. Convex ge- 
ometries provide a combinatorial abstraction of the notion of convexity. Antimatroids describe 
pruning processes, where we remove elements from (or add elements to) a set one at a time, and 
once an element becomes available for removal, it remains available until it is removed. There are 
many equivalent definitions of these objects and a vast underlying theory [3l [12]. We now present 
four points of view which we will use. 

2.1 Convex sets and closure operations 

A convex geometry is a pair (E,J\f) where E is a set and M C 2 E is a collection of subsets of E 
satisfying: 

(Ni) EeM. 

(N2) If A, B G M then A n B G N. 

(N3) For every A G N with A^ E there is an x £ A such that A U x G M. 

The sets in M are called closed or convex. It is sometimes convenient to think of A/" as a poset 
ordered by containment; this is a lattice. We can then think of (N3) as a property of accessibility 
from the top: every closed set can be obtained from E by removing one element at a time in such 
a way that every intermediate set in the process is also closed. 

The closure of A C E is defined to be 

t{A) = ft C, 

{C£Af:CDA} 

which is the minimum closed set containing A. It is easy to see that r is, in fact, a closure operator; 
that is, for all A we have A C r(A) and t(t(A)) = t(A), and for all A C £ we have r(A) C r(i?). 
Also, a set A is closed if and only if r(A) = A. 

Example 2.1. For a given graph, consider the subgraphs that can be obtained by successively 
removing leaves (nodes of degree 1). The vertex sets of these subgraphs are the closed sets of a 
convex geometry. The minimal closed set is the 2-core of the graph. Figured shows a specific graph 
and its lattice of closed sets; for example, the closure of the set {a, f,g} is the set {a, c, e, /, g}. 

An extreme point of a set A is an element a G A which is not in the closure of A — a. The set 
ex(^4) of extreme points of A is the unique minimal set whose closure is A. 

2.2 Antimatroids and pruning processes 

Next we define antimatroids, which are equivalent to convex geometries. Let £ be a set, whose 
elements we regard as letters. A word over the alphabet E is called simple if it contains no repeated 
letters; let E* be the set of simple words over E. 

An antimatroid is a pair (E, C) where E is a set and C C E* is a set of simple words satisfying: 
(LI) If a (3 G C then a G C; that is, any beginning section of a word of £ is in C 
(L2) If a,/3 G C and \a\ > \/3\, then a contains a letter x such that fix G C 
(L3) If a, (3 G C and x G E are such that ax, a/3 G C and x £ (3, then a[3x G C. 
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Figure 1: The set of configurations reachable by the process that successively removes leaves from 
a graph. The bottom configuration is the 2-core of the graph. 



Axiom (LI) says that C is left hereditary, (L2) is an exchange axiom, and (L3) states that, as 
we build up a word of £ from left to right, any letter which can be added to the word at a certain 
stage can still be added at any later stage. 

The supports of the words in jC are called the feasible subsets of E. The feasible subsets 
determine C: a word is in C if and only if every initial segment of it is a feasible set. The following 
theorem provides a one-to-one correspondence between antimatroids and convex geometries. 

Theorem 2.2. [12, Theorem III. 1.3] Let E be a finite set and J- be a collection of subsets of E. 
Then T is the collection of feasible sets of an antimatroid if and only if T* = {E — F : F G J 7 } is 
the collection of closed sets of a convex geometry. 

By the above theorem, the feasible sets of the antimatroid corresponding to our example of a 
convex geometry can be read on the descending paths from the top of the lattice of closed sets. A 
letter corresponds to every edge - this is the element that is removed. Any set of removed elements 
is the complement of a convex set, thus it is a feasible set. We will see below that the words of the 
antimatroid can also be read on the descending paths. 

An alternative characterization of antimatroids starts by defining a set H (x) C 2 E ~ X for every 
x G E, which is a collection of alternative precedences for x; each precedence is a set not containing 
x. Let £ be the set of words on the alphabet E such that x can only appear after at least one of 
its precedences has appeared: 

C = {x\ . . . Xfc : for all i there is a set A G H(xi) with A C {xi, . . . , 

In the example of Figure Q] the alternative precedences for c are {a, b}, {a,e}, and {&, e}. In 
general, for the process of removing leaves to obtain the 2-core of a graph, a vertex of degree d 
becomes removable as soon as d — 1 of its neighbors are removed. Thus its precedences are all the 
(d — l)-subsets of its set of d neighbors. 

A removal process is a procedure in which the elements of a set are removed one at a time 
according to a given rule. If every element which is removable at some stage of the process remains 
removable at any later stage, we call this a pruning process. 
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We are now in a position to explain the correspondence between pruning processes and an- 
timatroids. Given an antimatroid C on ground set E, we can consider each word to of £ as a 
removal sequence, which instructs us to remove the elements of w from left to right. These removal 
sequences describe a pruning process: if an element x is removable at a certain stage described by 
word w G C (that is, if wx G C), then one of the alternative precedences of x appears in w. At 
any later stage the removal sequence w' will contain w as a prefix, and therefore will contain that 
alternative precedence for x as well. 

Conversely, suppose we are given a pruning process on a set E. For each element x let the 
alternative precedences of x be the subsets A C E such that x is removable in E — A. Clearly the 
antimatroid determined by these sets of alternative precedences consists of the removal sequences 
in our pruning process. 

Example 12.11 gives rise to an antimatroid whose words are the pruning sequences of leaves that 
one can successively remove from the graph. These correspond to the descending paths from the 
top of the lattice of closed sets; the word indicates the elements that are being removed as we 
walk down. For example the leftmost path in the lattice of Figure Q] gives the word dabc, which 
corresponds to a valid order of successively removing leaves from the graph. 

2.3 Circuits and Paths 

The circuit description of a convex geometry is a good way to reveal the pruning process which 
generates it. A rooted set is a set with a designated element called the root. Any collection of rooted 
sets gives rise to a convex geometry as follows. Suppose C is a family of rooted subsets of E; label 
them (Ai, aj) where a\ G Ai and A4 C E. Call a subset S of E full if Ai -OjC5 implies aj G S for 
each i; that is, if the root of a set is never the only element of the rooted set missing from S. Say a 
full set S is accessible from E if there exists a sequence of full subsets E = Sq D Si D • • • D Sk = S 
with I Si — Sj+i[ = 1 for each i. The following is, in a different language, Lemma 3.7 of |12j . 

Proposition 2.3. \12j LetC be a collection of rooted sets in E. The collection M {C) of full subsets 
of E which are accessible from E is the collection of closed sets of a convex geometry on E. 

The set of rooted sets can be interpreted as pruning rules. Let for every e G E, C e = {C\{e} : 
(C, e) G C}). Then the corresponding pruning process is the one in which an element e is removable 
if and only if at least one element has been removed from each set in C e . 

Conversely, from a convex geometry it is possible to recover a set of rooted sets that generates 
it. A free set is one of the form ex(A). A circuit is a minimal set which is not free, and one can 
check that each circuit C has a unique element a which is in the closure of the remaining ones. 
This element is called the root of the circuit, and (C, a) is called a rooted circuit. 

Even though we will not need this fact, let us point out that the collection of full sets of a family 
of rooted sets has a nice structure. 

Proposition 2.4. A collection T C 2 E is the collection of full sets determined by a family of rooted 
sets if and only if it contains E and is closed under intersection. 

Proof. First, it is easy to see that the collection of full sets determined by a family of rooted sets 
C contains E and is closed under intersection. 

Next, suppose T is a collection of subsets of E that contains E and is closed under intersection. 
We start with C being the complete set of rooted sets on E. For every F G J- remove from C all 
rooted sets (AUo,o), where A C F and a ^ F. We claim that T is the collection of full sets of C. 
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It is immediate by the construction that every F € T is a full set. It remains to show that 
there are no other full sets. Suppose D C E is a full set for C. That means that all rooted sets 
(A U a, a) with A C D and a (/L D have been removed. In particular {D U a, a) has been removed 
for all a (ji D. This implies that for every a ^ D there exists F a £ J 7 such that D C F a and a ^ F a . 
Since I? = nF a and J 7 is closed under intersection, D is in T . □ 

So far in this paper, rooted sets have played the role of circuits in a convex geometry. It is worth 
pointing out, however, that one can consider rooted sets as paths which generate a convex geometry 
in a different way, as follows. Suppose V is a family of rooted subsets of E which we now call paths; 
label them (Pi,pi) where pi £ Pi and Pi C E. Let a subset S of E be path-full if for every e € S 
there exists (P, e) € V such that FC5. Let it be path-closed if it is path-full, and accessible from 
E by a sequence E 1 = So D Si D • • • D S& = S of path- full subsets with |Sj — Sj+i| = 1 for each 
i. Then the path-closed sets are the closed sets of a convex geometry, and every convex geometry 
arises in this way from a set of paths. 

In this context, the rooted sets can again be interpreted as pruning rules. For every e £ E let 
V e = {P\{e} : (P, e) £ V}. Then the corresponding pruning process is the one in which an element 
e is removable if and only if there is a set P £ V e such that every element of P has been removed. 

The pruning processes in the literature on random structures are generated by rules for removing 
elements which can usually be represented in a natural way through circuits or paths. While both 
points of view are equivalent, sometimes one is more natural than the other. For example, in finding 
the /c-core of a graph [25], a vertex becomes removable when at least one element has been removed 
from every ^-subset of its neighbors (circuit rule). On the other hand, in the case of identifiable 
vertices in hyper graphs [6] a vertex is removable if, in at least one of the hyper edges in which it 
appears, every other vertex has been removed (path rule). 

In the rest of the paper, all the rooted sets that appear play the role of circuits. 

2.4 Lattices 

Finally, we can also think of convex geometries as meet-distributive lattices. A lattice L is meet- 
distributive if for any element x ^ the interval [m(a;),x] is a Boolean lattice, where m{x) is the 
meet of the elements covered by x. Meet-distributive lattices are precisely the posets of closed sets 
of convex geometries. [31 Prop. 8.7.5] 

One might wonder whether something more specific can be said about the convex geometries 
that arise from a set of circuits each of size at most k. This question will be particularly natural 
in Section HI where convex geometries are applied to the A;-SAT problem for a fixed value of k. 

For k = 2 the situation is very nice. Recall that a lattice L is distributive if x A (y V z) = 
(x A y) V (x A z) for any x, y, z in L. 

Proposition 2.5. Cor. 3.10] Let C be a set of rooted sets of size 2. The convex geometry 
M{C) generated by these sets is a distributive lattice. Conversely, every distributive lattice arises 
in this way. 

For higher values of k, if we start with a collection C of rooted circuits of size k, the resulting 
convex geometry J\f generally has additional rooted circuits of different sizes. For the case of 
k = 2, all circuits of the generated convex geometry have size 2. However, for example, the convex 
geometry defined by the rooted 3-sets (abc,b) and (bde,d) also has (acde,d) as a circuit. 
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If we have a bound on the size of the circuits of a convex geometry, we can make the following 
statement. A lattice L is k- distributive if x A (yo V • • • V y^) = (x A i/o) V • • • V (i A i/t) for any 
x,y , ■ ■ ■ ,yk in L. 

Proposition 2.6. J73J Cor. 4-3-] Let k > 3 be an integer. If all circuits of a convex geometry 
have size at most k, then its poset of closed sets is a {k — 1)- distributive lattice. Not every (k — 1)- 
distributive lattice arises in this way. 

3 Convex geometries as interval partitions of Boolean lattices 

In this section we describe our new characterization of convex geometries. We show that convex 
geometries on a set E are characterized by the fact that they induce a certain partition of the 
Boolean lattice 2 E . This partition is encoded in a polynomial identity which, as we will later see, 
generalizes Theorem 14.21 from |17j . 

For any collection Af C 2 E of subsets of E, and a set A in Af, say that an element a G A is 
excludable from A if A — a is in Af. Let ex(A) be the set of excludable elements of A. When Af is 
the collection of closed sets of a convex geometry, ex(A) is the set of extreme points of A. 

Theorem 3.1. Let Af C 2 E be a collection of subsets of a non-empty set E. The following 
statements are equivalent: 

1. Af is the collection of closed sets of a convex geometry. 

2. As A ranges over Af the intervals [ex{A),A] partition the Boolean lattice 2 E ; that is, for every 
D C E there is a unique A G Af such that ex(A) C D C A. 

3. For any collection of pi and for i G E such that pi + qi = 1 for all i, we have 

e n» n 

AeAf i£A j£ex(A) 

Proof. 1. implies 2. Notice that if A S M is such that ex(A) C D C A, then we have that 
A = r(ex(A)) C t(D) C t(A) = A; so the only possible choice for A is A = t(D). It remains 
to notice that, since ex(^4) is the unique minimal set such that A = r(ex(A)), and A = t(D), it 
follows that D D ex(A), and therefore D G [ex(^4), A]. 

2. implies 1. We define the map <j> : 2 E — > A/", as follows: for every D C E, let 4>(D) be the 
unique element A G A/" such that ex(^4) C Z) C A We need to show axioms (N1)-(N3) of a convex 
geometry (E,Af): E is in Af, M is closed under intersection, and every A G M is accessible from £A 

Axiom (Nl) obviously holds, because (fi(E) = E is in N . To show (N3), we show that every set 
A G M is accessible from any superset B D A that also belongs to Af. It suffices to prove that there 
exists an element of B\A that is excludable from B. If that were not the case, then ex(B) C A, 
and both of the intervals [ex(^4), A] and [ex(B),B] would contain A, a contradiction. 

Finally, we need to prove (N2), which states that M is closed under intersection. First we prove 
the following statement: 

If B G Af and A C B then (f>(A) C 5. 

Suppose that we remove one element at a time from 1? in any arbitrary way, with the restriction 
that the intermediate sets in the process must all be in Af and contain A. We keep doing this until 
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we cannot continue anymore; suppose the set we obtain is C; by construction, B D C C) A. That 
means that every element excludable from C is in A, so ex(C) C A C C. Thus C = 4>(A) and we 
obtain the desired statement. 

Now suppose Ai and A2 are in TV. From A\ n A2 C j4i we obtain that </>(.Ai D ^2) Q A\. 
Similarly 4>{A\ n A2) C A2, so 0(^4i n^) C n A2. But the reverse inclusion holds by definition, 
so we must have equality. It follows that A% n A2 is in M . 

2. implies 3. Observe that 

e iiPiii<ij=T[(Ph+qh)=i 

DCE igD jeD heE 

Therefore, since for every D there is a unique A such that ex(^4) C D C A, it suffices to prove that 
for every A G A/": 

n ^= 2 n^n^ 

j^A jeex(A) De[ex(A),A] igD jeD 

This is easily seen to be true because: 

e n»n* - n* n * e f n «n«) 

De[ex(A),A] igD jeD igA jeex(A) RCA\ex(A) \ieA\(ex(A)Ui?) jei? / 

= n n n ^+^) 

igA jeex(A) feeA\ex(A) 

= n 

ieA jeex(A) 

3. implies 2. Consider any set D Q E, and let p a = if a G D and p a = 1 otherwise. The equality 
becomes: 

1 = e n« n © 

AeA^ i^A jecx(A) 

E 1 

AeA/" : ex(A)CDCA 

Therefore there is exactly one set A € Af for which ex(A) CDC A □ 

The following corollary gives a characterization of pruning processes among removal processes: 

Corollary 3.2. A removal process on a set E is a pruning process if and only if, for each subset S 
of E, there is a unique minimal set r(S) containing S which is achievable by the removal process. 

Proof. As outlined in Section [2T2l a pruning process gives rise to a convex geometry, and in that case 
t(S) is just the convex closure of S. For the other direction, let M consist of the sets achievable 
by the removal process; it suffices to show that N is the collection of closed sets of a convex 
geometry. We will show that property 2 of Theorem 13.11 holds. For any set S C E, it holds that 
ex(r(S*)) C5C t(S), because if there is an excludable element of t(S) that is not in S, then t(S) 
would not be the minimal set containing S. Furthermore, any set T £ N for which ex(T) C S C T 
is a minimal set containing S because all of its excludable elements are in S. Since there is a unique 
such set, it is t(S). □ 

We conclude this section by offering a probabilistic interpretation of property 3 of Theorem 13. 11 
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3.1 A probabilistic interpretation 



Let (E,Af) be a convex geometry, and fix < p e , q e < 1 with p e + q e = 1 for each element e of E. 
Define a probability distribution tt\ on the subsets of E by independently deleting element e with 
probability p e and keeping it with probability q e : 

Pi(A) = T\p i T[q j , ACE. 

i<£A jeA 

Define a probability distribution tt2 on the convex sets of E by: 

Pr(A) = TT K TT qj , AeM. 

if A jeex(A) 

The implication 1 —* 3 of Theorem 13.11 tells us that 7T2 is, indeed, a probability distribution. 
Furthermore, in order to sample from TT2, it suffices to sample from 7Ti and compute the closure of 
the obtained set. 

Theorem 3.3. Let f{A) be a function defined on the subsets A of E which depends only on t(A). 
Then the expected value of f when we sample from the distribution tt\ on all subsets of E, equals 
the expected value of f when we sample from the distribution TT2 on the convex sets of E. 

Proof. Assuming that f{D) = /(r(D)), the identity 

e /p)n»n<&=E n « 

DCE i£D jeD AeAf i<£A j£ex(A) 

can be established in exactly the same way as implication 1 — > 3 of Theorem 13.11 □ 

We note that Aivaliotis, Gordon, and Graveman [2j and Gordon |10j studied the problem of 
choosing a random subset of a convex geometry under the distribution %\. They related the 
expected rank of this random subset to the Tutte polynomial of the antimatroid. In particular, 
they discovered (pQ) in a special case which is no simpler than the general case. 

The fact that TT2 is a probability distribution on J\f is not explicitly stated in [2j or [10J, and 
neither is the probabilistic interpretation of the right hand side of (pQ). However, these two results 
follow very easily from that work. Our theorem that the probabilistic property of Theorem 13.31 (or 
the weaker condition 3 of Theorem 13. ip characterizes convex geometries is new. 



4 Convex geometries in the k-SAT problem 

Let F be a Boolean formula such as 

F = (x\ V X2 V X-i) A (X2 V X3 V X4). 

We can assume that F is written in conjunctive normal form as a conjunction of certain clauses C 
in the variables V and their negations. The Boolean satisfiability problem (SAT) is to determine 
whether there is some assignment of TRUE (1) and FALSE (0) to the variables which makes the 
entire formula true. The k-SAT problem is the same problem when restricted to formulas with 
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clauses of a fixed size k. For k = 2 there is a polynomial time algorithm for deciding satisfiability, 
however for k > 3 the problem is NP-complete. 

In their analysis of the Survey Propagation algorithm |19|, 0] for 3-SAT, Maneva et al |17j 
discovered a polynomial identity that holds for any SAT problem and any satisfying assignment. 
To define this identity first we need to introduce the concept of partial assignments, where to each 
variable is assigned one of the values 0, 1, or *; the value * indicates that a variable is unassigned and 
free to take either value. Say that a partial assignment x is invalid for a clause C if plugging x into C 
gives either 0V0V- • - V0 (which makes the clause invalid) or 0V- • - V0V*V0V- • - V0 (where the * is not 
free to take either value). A partial assignment x is valid for a formula if it is valid for all its clauses. 
For example, some valid partial assignments for the formula F = (x\ V X2 V X3) A {xi V X3 V X4) 
are (1, 1, 1, 1), (*, 1, *, *) and (1,*,*,1), and some invalid partial assignments are (1,1,0,*) and 

(*,*,M). 

Definition 4.1. Given a Boolean formula F , the poset P(F) of valid partial assignments is defined 
by decreeing that a covers b if b is obtained from a by switching a or 1 to a *. 

Figure [2] shows part of the poset of valid partial assignments for the formula F above. Note that 
in this example, somewhat surprisingly, (1,1,1,1) is not greater than (1,*, *, 1) in P(F), because 
to stay valid one must switch x<i and X3 from 1 to * simultaneously: (1, *, 1, 1) and (1, 1, *, 1) are 
invalid. 




Figure 2: Some of the valid partial assignments for the formula {x\ V 12 V £3) A (x2 V 13 V £4). 
Highlighted are the assignments below the satisfying assignment (1, 1, 1, 1). Edges are labeled with 
the index of the variable whose value differs in the two adjacent assignments. 



Definition 14.11 suggests that, given a valid partial assignment a, we call a coordinate i either: 

(a) a star *, 

(b) unconstrained if G {0, 1} and setting cij = * keeps the assignment valid, or 

(c) constrained if <Zj G {0, 1} and setting aj = * gives an invalid assignment; that is, if is the only 
satisfying variable in some clause of F. 

Let S(a), U(a), C(a), and N(a) be the sets of star, unconstrained, constrained, and numerical 
variables of a, respectively; so V = S(a) U N(a) and N(a) = U(a) U C{a). 
Maneva et al [17J defined the weight of a partial assignment a to be 

W{a)=p\ s ^\q\ u ^\ 
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where p and q are parameters in the interval [0,1]. They considered the probability distribution 
which assigns to a a probability proportional to W{a) for every valid partial assignment a. The 
survey propagation algorithm was then proved to be equivalent to applying the belief propagation 
marginalization heuristic [24J to this distribution with suitably chosen p and q. This distribution 
has the following property, which should not look surprising in view of Theorem 13.11 

Theorem 4.2. ji7| / For any satisfying assignment a of a Boolean formula F and p + q = 1, 

J>|S(6)|^(6)| = ^ 
b<a 

summing over all valid partial assignments b which are less than a in P(F); that is, summing over 
the subposet P(F)< a . 

Thus the probability distribution on partial assignments with p > may be regarded as a 
"smoother" version of the uniform distribution over satisfying assignments, which corresponds to 
the case p = 0: if we choose a valid partial assignment b at random, then the probability of being 
under a is the same for any satisfying assignment a. Another consequence of the above theorem 
is that if the total weight of all valid partial assignments is less than 1, then the formula has no 
satisfying assignment. In recent work of Sinclair and the second author |18j . this fact was used in 
conjunction with the first-moment method to bound the probability of satisfiability of a random 
SAT formula with clauses of sizes 2 and 3. 

Consider the following experiment: 

1. in a valid assignment a, change a random unconstrained variable to *, and 

2. repeat until there are no unconstrained variables. 

This procedure has been referred to as "peeling", "whitening", "coarsening" and "pruning". We 
now recognize it as a pruning process on the set of variables which have numerical values in a. At 
each stage, we are allowed to remove an unconstrained variable; notice that if a variable becomes 
unconstrained, it remains unconstrained throughout this process. 

This experiment is equivalent to taking a random path from x down the partial order P(F), by 
choosing at each step a random partial assignment that is covered by the current one. For a fixed 
choice of a, any such path terminates at the same partial assignment, which is known as a "core". 
(Note, however, that different a may lead to different core assignments.) 

Achlioptas and Ricci-Tersenghi pQ examined the above pruning process and proved that, for 
k > 9 and a formula chosen from a particular distribution of interest, there is a high probability 
that the process will terminate before removing all variables. This is not known to hold for k = 3. 

With the above description of the removal process, our next result follows easily. 

Theorem 4.3. Let F be a SAT formula with variables V, and let a be a valid (possibly partial) 
assignment for F . Let 

N = {N(b) : b is a valid partial assignment such that b < a}. 

Then (N(a),Af) is a convex geometry. Conversely, every convex geometry arises in this way from 
a valid assignment for a SA T formula. 
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Proof. We show that this statement is equivalent to Proposition 12.31 Consider the clauses of F 
with a unique satisfying variable in a, which give 0V---V0V1V0V---V0 when we plug a into 
them. If C is the set of variables in such a clause (which must be a subset of N(a)) and v is the 
unique satisfying variable, form a rooted set (C, v). Then (iV(a), jV) is clearly the convex geometry 
generated by these rooted sets. Conversely, given a convex geometry, one can encode its rooted 
sets into the clauses of a SAT formula with a valid assignment. □ 

The convex geometry corresponding to assignment (1, 1, 1, 1) in Figure[2]is the collection of sets 
of assigned variables in assignments lying below (1, 1, 1, 1): 

{{1, 2, 3, 4}, {2, 3, 4}, {1,2, 3}, {2, 4}, {2, 3}, {1,3}, {4}, {2}, {3}, {1},0} 

and the words of the corresponding antimatroid can be read out by going down the directed edges; 
the feasible sets are: 

{0,{1}, {4}, {1,3}, {1,4}, {2, 4}, {1,2, 3}, {1,3, 4}, {1,2, 4}, {2,3, 4}, {1,2, 3, 4}} 

In the particular case of 2-SAT, the convex geometry is very special. By Proposition 12.51 the 
poset P(F)< a of valid partial assignments is a distributive lattice. 

Notice that a SAT formula F generally has several different valid partial assignments, and 
each assignment a gives rise to a convex geometry G(F,a). These different convex geometries fit 
together nicely, as seen in Figure [2j If G(F, a) and G(F, b) have a non-empty intersection, then 
their intersection is the convex geometry G(F, c) for the unique element c with maximal N(c) for 
which o L = * if cij ^ hi, and q = aj = bi otherwise. 

The machinery that we have built up now provides a more illustrative multivariate version of 
Maneva, Mossel, and Wainwright's Theorem 14.21 on the probability distribution determined by a 
SAT problem F and a valid assignment a. More importantly, in view of Theorem 13.11 it tells us 
that the identity of Theorem 14.21 holds precisely because a SAT problem gives rise to a convex 
geometry. Therefore convex geometries are really the context in which this identity should be 
understood. 

Theorem 4.4. For a valid partial assignment b of a Boolean formula F with variables V , let 
S(b),U(b) and C(b) denote the sets of star, unconstrained, and constrained variables of b in F, 
respectively. Let pi and qi be such that pi + q^ = 1 for all i £ V . Then, for any valid assignment a 
of a Boolean formula F, 

e n « n *=i 

b<a ieS(b) j£U(b) 
summing over all valid assignments b which are less than a in P(F). 

Proof. The result follows directly from Theorems 13.11 and 14.31 

□ 
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